Skip to content

Downsample waterlevel trends to daily min depth-to-water#87

Merged
jirhiker merged 2 commits into
mainfrom
fix/waterlevel-trend-daily-downsample
Jun 26, 2026
Merged

Downsample waterlevel trends to daily min depth-to-water#87
jirhiker merged 2 commits into
mainfrom
fix/waterlevel-trend-daily-downsample

Conversation

@jirhiker

Copy link
Copy Markdown
Member

Problem

The nm_waterlevel_trends combine step crashed its child process at the native level — ChildProcessCrashException with no Python traceback (OOM kill). Root cause: pymannkendall.original_test is O(n²), and high-frequency wells (continuous loggers — tens of thousands of readings) blew up memory/CPU running the per-well test.

Fix

Before the trend test, downsample each well's observations to one point per calendar day, keeping the daily minimum depth-to-water (the shallowest reading) — _daily_min_series. This bounds the Mann-Kendall cost and removes within-day sampling noise. A well measured continuously for years collapses from ~10⁴–10⁵ points to ~10³ daily points.

Changes

  • _daily_min_series(obs_list) — groups by UTC calendar day, keeps min DTW per day at the day's midnight epoch; returns (raw_count, sorted_daily_pairs).
  • Trend dumper uses it: record_count is now the daily point count used for the trend; new observation_count carries the raw reading count.
  • Qualification gate + Mann-Kendall/Theil-Sen classification unchanged (now applied to the daily series).
  • Updated TREND_METHOD_DESCRIPTION to document the daily-min step.
  • New test: same-day readings collapse to the min; observation_count vs record_count.

Verification

14 persister tests pass. Independent of #86 (touches only ogc_features.py + its test).

🤖 Generated with Claude Code

High-frequency wells (e.g. continuous loggers, tens of thousands of
readings) made the per-well Mann-Kendall test (O(n^2)) blow up memory and
get the combine child process OOM-killed (ChildProcessCrashException, no
Python traceback).

Before the trend test, reduce each well's observations to one point per
calendar day keeping the daily minimum depth-to-water (the shallowest
reading), via _daily_min_series. This bounds the MK cost and removes
within-day sampling noise.

- record_count is now the daily-point count used for the trend; new
  observation_count carries the raw reading count.
- qualification gate and classification unchanged (applied to daily
  points); updated TREND_METHOD_DESCRIPTION.
- test: daily-min downsampling (same-day readings collapse to the min).

14 persister tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown

Your pull request is automatically being deployed to Dagster Cloud.

Location Status Link Updated
die-orchestration View in Cloud Jun 26, 2026 at 09:32 PM (UTC)

The trend combine rebuilt every observation into a ParameterRecord (and
each site into a SiteRecord) before computing — millions of objects for
statewide water-level data, on top of the already-large pickled inputs.

Consume the payload dicts directly: dump_waterlevel_trend_collection and
_daily_min_series now read dicts (obs.get / site.get) instead of getattr
on record objects, and the combine asset passes all_sites/all_timeseries
straight through. Cuts peak memory in the step that was OOM-crashing.

Other dumpers (summary, major-chemistry, timeseries) still use record
objects. Trend test helpers now build dicts. 14 persister tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@jirhiker jirhiker merged commit b9ea78c into main Jun 26, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant